Add comments into the AST #14352

doorgan · 2025-03-21T18:02:23Z

This implements AST comments and updates the formatter to look for comments in the AST instead of a separate comments list.

This adds a new :include_comments to Code.string_to_quoted/2 and Code.string_to_quoted!/2.

Comments are attached to AST nodes in three variants:

:leading_comments: Comments that are located before a node, or in the same line.

Examples:

# This is a leading comment
foo # This one too

:trailing_comments: Comments that are located after a node, and before the end of the parent enclosing the node(or the root document).

Examples:

foo
# This is a trailing comment
# This one too

:inner_comments: Comments that are located inside an empty node.

Examples:

foo do
  # This is an inner comment
end

[
  # This is an inner comment
]

%{
  # This is an inner comment
}

A comment may be considered inner or trailing depending on wether the enclosing node is empty or not. For example, in the following code:

foo do
  # This is an inner comment
end

The comment is considered inner because the do block is empty. However, in the following code:

foo do
  bar
  # This is a trailing comment
end

The comment is considered trailing to bar because the do block is not empty.

In the case no nodes are present in the AST but there are comments, they are inserted into the root :__block__ node as :inner_comments.

Many of the ideas here come from Sourceror, which takes inspiration for the recast parser. In recast, comments are attached to nodes with the leading and trailing boolean flags. If a comment has both attributes set to false, it represents an "inner" comment. For the sake of clarity I decided to define it explicitly for the Elixir AST.
An additional difference is that recast stores all the comments(leading, trailing, inner) mixed together under a single comments field. I find that separating them to be cleaner and easier to work with, since you don't have to split or filter the list to get a certain kind of comment from a node.

Different from Sourceror which supports only :leading_comments and :trailing_comments, it introduces :inner_comments to remove the ambiguity between comments before the end keyword, and comments after an expression. This also becomes important when working on the formatter code, as it is very awkward to inject inner comments after the call args were already converted to algebra docs. The Sourceror implementation relies on the formatter behavior to produce ast and comments that the formatter would interpret correctly, meaning it can get away with a sub-optimal comment merging strategy; it only needs the formatter to work. This PR on the contrary tries to attach comments where they make the most sense, and updates the formatter accordingly instead of wrangling with comment line numbers.

The current implementation in this PR adds an extra traversal step after the code is initially parsed. Due to this, and due to the logic to merge comments and AST requiring precise line information in every AST nodes, it requires to also set up the following options:

[
  literal_encoder: &{:ok, {:__block__, &2, [&1]}},
  token_metadata: true
]

TODO

Add public documentation
Handle formatting of binary operator forms and binary operator chains
Fix cases from the formatter integration tests
Run the formatter on the Elixir codebase itself to ensure nothing changes

josevalim · 2025-03-28T08:04:31Z

lib/elixir/lib/code.ex

+    include_comments = Keyword.get(opts, :include_comments, false)
+
+    Process.put(:code_formatter_comments, [])
+    opts = [preserve_comments: &preserve_comments/5] ++ opts


I think I told you privately that we could make preserve_comments: :ast an option but I now realized that preserve_comments is actually a private option, so the :include_comments option is better.

doorgan · 2025-04-01T02:08:40Z

There's a few issues I still need to tackle and a few more cleanups are pending(mostly around the code for formatting interpolations and the repetition of comment docs generation in general), but I think this is generally ready for an initial round of reviews

There are 3 tests currently failing that are related to binary operators:

  1) test case with when and clause comment (Code.Formatter.IntegrationTest)
     test/elixir/code_formatter/integration_test.exs:298
     Assertion with == failed
     code:  assert IO.iodata_to_binary(Code.format_string!(good, opts)) == String.trim(good)
     left:  "case decomposition do\n  <<h, _::binary>> when h != ?< ->\n    decomposition =\n      decomposition\n      |> :binary.split(\" \", [:global])\n      |> Enum.map(&String.to_integer(&1, 16))\n\n    Map.put(dacc, String.to_integer(codepoint, 16), decomposition)\n\n  _ ->\n    dacc\nend"
     right: "case decomposition do\n  # Decomposition\n  <<h, _::binary>> when h != ?< ->\n    decomposition =\n      decomposition\n      |> :binary.split(\" \", [:global])\n      |> Enum.map(&String.to_integer(&1, 16))\n\n    Map.put(dacc, String.to_integer(codepoint, 16), decomposition)\n\n  _ ->\n    dacc\nend"
     stacktrace:
       test/elixir/code_formatter/integration_test.exs:299: (test)

  2) test comment inside operator with when (Code.Formatter.IntegrationTest)
     test/elixir/code_formatter/integration_test.exs:590
     Assertion with == failed
     code:  assert IO.iodata_to_binary(Code.format_string!(bad, opts)) == result
     left:  "raise # Comment\n      function(x) ::\n        any"
     right: "# Comment\nraise function(x) ::\n        any"
     stacktrace:
       test/elixir/code_formatter/integration_test.exs:597: (test)

  3) test first argument in a call without parens with comments (Code.Formatter.IntegrationTest)
     test/elixir/code_formatter/integration_test.exs:468
     Assertion with == failed
     code:  assert IO.iodata_to_binary(Code.format_string!(good, opts)) == String.trim(good)
     left:  "with bar ::\n       :ok\n       | :invalid\n       | :other"
     right: "with bar ::\n       :ok\n       | :invalid\n       # | :unknown\n       | :other"
     stacktrace:
       test/elixir/code_formatter/integration_test.exs:469: (test)

Once those are fixed, running make format should produce the same results as the formatter would before this change. I found a few scenarios where newlines may be added/removed for some trailing comments but that issue should be relatively easy to fix once I figure it out.

josevalim · 2025-04-01T09:12:31Z

Fantastiac!

Btw, number #2, the code you are generating is semantically wrong.

raise # Comment
         function(x) :: any

is equivalent to:

raise()
function(x) :: any

Just in case it was not clear :) for functions without parens, it is probably safer to move it outside

doorgan · 2025-04-01T13:03:51Z

Yep, that case is definitely wrong.
One subtlety of dealing with binary_op formatting is that in general we want to hoist the operand comments to be above the operator, but if the operator is nested in a function call(in this case raise) then it's harder to produce the comment above the parent doc without also changing the code that handles the parent node. That's what is causing that broken formatting in 2.

doorgan force-pushed the doorgan/ast_comments branch from 3eb21c3 to 5dcb6e7 Compare March 27, 2025 23:13

josevalim reviewed Mar 28, 2025

View reviewed changes

doorgan force-pushed the doorgan/ast_comments branch 4 times, most recently from a1580da to b12a9a3 Compare March 31, 2025 15:56

doorgan marked this pull request as ready for review April 1, 2025 02:08

doorgan added 7 commits April 7, 2025 17:39

Merge comments into the AST

9b9e308

Update formatter to look for AST comments

e3e4fac

Handle more scenarios and update formatter

c9e9a02

Cleanups

67fb06e

Fix getting start line for last arg

05afe40

Fix handling of stab comments

c0e4f8b

Format binary operators comments

c97ee1f

doorgan force-pushed the doorgan/ast_comments branch from 40207bc to c97ee1f Compare April 7, 2025 20:42

doorgan added 4 commits April 11, 2025 17:23

Small cleanups

221f1f8

Fix comments in cons pipe not printing

102fd41

Merge remote-tracking branch 'origin/main' into doorgan/ast_comments

91a91a3

Merge branch 'main' into doorgan/ast_comments

d96f021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comments into the AST #14352

Add comments into the AST #14352

Uh oh!

doorgan commented Mar 21, 2025 •

edited

Loading

Uh oh!

josevalim Mar 28, 2025

Uh oh!

doorgan commented Apr 1, 2025

Uh oh!

josevalim commented Apr 1, 2025

Uh oh!

doorgan commented Apr 1, 2025

Uh oh!

Uh oh!

Add comments into the AST #14352

Are you sure you want to change the base?

Add comments into the AST #14352

Uh oh!

Conversation

doorgan commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

josevalim Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

doorgan commented Apr 1, 2025

Uh oh!

josevalim commented Apr 1, 2025

Uh oh!

doorgan commented Apr 1, 2025

Uh oh!

Uh oh!

doorgan commented Mar 21, 2025 •

edited

Loading